Neural Network Implementation

This JavaScript implementation of a neural network is based on the Python example published by Victor Zhou on March 3, 2019. The following is a summary of Victor's introductory explanation of neural networks and how to program them.

The building blocks of a neural network are "neurons", essentially a black box with inputs and outputs. In this introduction all neurons will have 2 inputs and one output. Within the neuron the inputs are multiplied by weighting factors, then the weighted inputs are added together with a bias.

$x_{1} \to x_{1} \times w_{1}$

$x_{2} \to x_{2} \times w_{2}$

$(x_{1} \times w_{1}) + (x_{2} \times w_{2}) + b$

Finally the sum is passed through an activation function: $y = ƒ ((x_{1} \times w_{1}) + (x_{2} \times w_{2}) + b)$

The activation is used to turn unbounded input into a more predictable form. A commonly used activation function is the sigma function. Using the sigma function turns numbers in the range $(- \infty, + \infty)$ to $(0, 1)$ . The sigmoid function is $ƒ (x) = \frac{1}{(1 + e^{- x})}$ This process of passing inputs forward to produce an output is called feedforward.

A neural network is simply a number of neurons connected together into layers. The network we will model programmatically has three layers, an input layer with two inputs, a hidden layer with two neurons, and an output layer. The inputs for the output layer are the outputs from the neurons in the hidden layer. A more complex neural network would have more hidden layers and more neurons in each layer.

So this network we are building will have 2 inputs, $(x_{1}, x_{2})$ . Each of these will serve as input for both neurons in our hidden layer, $(h_{1}, h_{2})$ . There is one neuron, o₁ for our output layer. So we have four weightings from input to hidden layer and two weightings from hidden to output.

The example from Victor is using weight and height data to predict a person's gender. So given data from two women and two men can we build a model that will predict gender from new weight and height data. The following table shows the data we will use.

Name	Weight (lb)	Height (in)	Gender
Alice	133	65	F
Bob	160	72	M
Charlie	152	70	M
Diana	120	60	F

It is often common to shift the data and normally this is done by the mean. So subtracting the mean weight, 141.25, from the weights and the mean height, 66.75, gives the following revised table.

Name	Weight (lb)	Height (in)	Gender
Alice	-8.25	1.75	F
Bob	18.75	5.25	M
Charlie	10.75	3.25	M
Diana	-21.25	-6.75	F

Training requires a measure of how "good" the predictions are, so we can track this measure and seek improvement. This is called the loss. Here the mean squared error (MSE) loss.

MSE = \frac{1}{n} \sum_{x = 1}^{b} {(y_{true} - y_{pred})}^{2}

Where n is the sample number, y is the variable being predicted (gender), y-true is the value of the correct variable (1), and y-pred is the predicted value of y from the network. Thus we are seeking the average over all of the squared errors. The better the prediction the lower the loss. To train a network you adjust the weighting factors and biases to minimize the loss.

If we label the six weights and three biases per the diagram above, the we can write the loss as a multivariable function:

L (w_{1}, w_{2}, w_{3}, w_{4}, w_{5}, w_{6}, b_{1}, b_{2}, b_{3})

The partial derivative of L with respect to w1 can tell us how the loss will change as w1 changes. Remember we defined L as the loss above.

\frac{\partial L}{\partial w_{1}} = \frac{\partial L}{\partial y_{pred}} \times \frac{\partial y_{pred}}{\partial w_{1}}

Because $L = - 2 {(1 - y_{pred})}^{2}$ so $\frac{\partial L}{\partial y_{pred}} = - 2 (1 - y_{pred})$ . We also know that $y_{pred} = o_{1} = ƒ (w_{5} h_{1} + w_{6} h_{2} + b_{3})$ where f is the sigmoid function.

Since w1 only affects h1 (not h2), we can write

\frac{\partial y_{pred}}{\partial w_{1}} = \frac{\partial y_{pred}}{\partial h_{1}} \times \frac{\partial h_{1}}{\partial w_{1}}

\frac{\partial y_{pred}}{\partial h_{1}} = w_{5} \times ƒ^{'} (w_{5} h_{1} + w_{6} h_{2} + b_{3})

We can follow the same reasoning for $\frac{\partial h_{1}}{\partial w_{1}}$ :

h_{1} = ƒ (w_{1} x_{1} + w_{2} x_{2} + b_{1})

\frac{\partial h_{1}}{\partial w_{1}} = x_{1} \times ƒ^{'} (w_{1} x_{1} + w_{2} x_{2} + b_{1})

The derivative of the sigmoid function, f ^', is:

ƒ^{'} (x) = \frac{e^{x}}{{(1 + e^{- x})}^{2}} = ƒ (x) \times (1 - ƒ (x))

So if we have a dataset that only consists of Alice (wt = -8.25, ht = -1.75, gender = 1) and set all of the weightings to 1 and the biases to 0, we can feed this through the network...

h_{1} = ƒ (w_{1} x_{1} + w_{2} x_{2} + b_{1}) = ƒ (- 8.25 - 1.75 + 0) = 0.0000454

h_{2} = ƒ (w_{3} x_{1} + w_{4} x_{2} + b_{2}) = 0.0000454

o_{1} = ƒ (w_{5} h_{1} + w_{6} h_{2} + b_{3}) = ƒ (0.0000454 + 0.0000454 + 0) = 0.500

So y-pred is 0.500, which means neither male nor female is favored as expected. Using the formulas above one can calculate $\frac{\partial L}{\partial w_{1}}$ , probably a very small number but I will leave that as an exercise as I am tired of writing MathML.

The next thing to consider is the optimization algorithm used to modify the variables and minimize loss. The algorithm implemented here is stochastic gradient descent or SGD. One basically subtracts $\frac{\partial L}{\partial w_{1}}$ multiplied by a constant, η, from w1 to get a new w1. η is called the learning rate as it controls how much we modify w1.

The process works as follows:

Choose a single sample from the dataset (thus stochastic).
Calculate all the partial derivatives of loss with respect to weightings or biases.
Use an update equation to modify weightings and biases appropriately.
Repeat as needed.

At this point the learning continues for 1000 cycles based on the data in the table above. The program then predicts the gender of the following two individuals: Emily (-7,3) and Frank (20,2). The predictions are listed in the two fields below. A graph is also constructed that shows how the loss improves through the 1000 training iterations.

See Neural Network Playground for a real neural network.

Hit the button to start the learning process:

Emily's predicted gender (1 => female)
Frank's predicted gender (0 => male)

Neural Network Implementation

Improvement in loss through the 1000 iterations